Managing the technical quality of your site has become more complex and the number of metrics you collect has skyrocketed. Faced with hundreds of candidate metrics, how do you select those that are most meaningful? In this session you will learn which KPIs are key for successfully testing and managing your site. You will walk away with a holistic framework for managing site quality.
My INSURER PTE LTD - Insurtech Innovation Award 2024
Metrics that Matter-Approaches To Managing High Performing Websites
1. Metrics that Matter
– Approaches to
Managing High
Performing Web
Sites
Ben Rushlo
Director Keynote Professional
Services
June, 22nd 2009
2. Agenda
User Centric System Approach
Performance Management Begins with Metrics
Metrics That Matter
Diagnostic Process
Keys to Improving Performance
Implementing A Total Site Quality Framework
3. Personal Background
9 years at Keynote – Keynote Consulting Practice
5 years at Keynote – Director of Keynote Consulting Practice
Focus on mid/large enterprise sites
Wal-Mart
eBay
Honda
Ford
Schwab
Background in capacity planning
CIS/MIS degree
5. Change Has Come
Single data center Cloud hosting/services
HTML ASP/JSP
JS AJAX
Animated GIFs Sites Completely in Flash
Content Driven Transaction Driven Experience
Driven
US Market Global Market
Single domains 20 domains per page
Legacy systems Outsourced web services
6. Change Has Come
Your user has changed
Decreased tolerance, increased expectations
Utility/Always on
Integrated completely into our lives
When Larry King is using Twitter….
When outages are front page news…
7. System Approach
“A system is a dynamic and complex whole,
interacting as a structured functional unit”
8. Online Applications Are Complex
Systems
Application Code
Content Delivery Network
Front End Design
Third Party Web Services
Online
Application
Network/Servers/Infrastruc Tracking/Ad Tags
ture User
Experience
ISPs
Cloud Services
Creative/Visual Content
9. Online Applications Are Complex
Systems
JSP ASP
Application Code
DB Query Java
Environment
CSS Code Java Script Code
Front End Design
Browser
AJAX/XML
Threading
11. Online Applications Are Complex
Systems
While we have undergone rapid change in the area of
web site design/technology/architecture has
performance management changed with it?
Or are we still living in a client server focused
paradigm?
Are we viewing the discrete and disconnected elements
of the system and not the system?
CPU/Memory/IO etc.
Garbage collection rate/threads etc.
Locks/query time etc.
12. Top Order Metrics
In any complex system, there is an overwhelming
number of metrics (things to measure that describe
elements of the system)
However, within any system there are key indicators
of system health
Think of air speed, altitude
Think of GDP or consumer confidence
Think of blood pressure and weight
13. Top Order Metrics
Top order metrics require a top down approach
It is virtually impossible to combine low level metrics
upwards to understand system health
Except for extreme cases (100% CPU, server down etc.)
Most performance management issues are not so simple
Low level metrics are very useful once you have
identified areas of focus/problem areas
14. Top Order Metrics
Performance management must begin and end at the
end users perspective
The end user provides
A unifying approach to a very complex system
Key barometer of site/application success
A direct tie to business owner/goals and work of performance
management team
16. Data Collection
Beginning with the users perspective (unifying approach) how do
we collect data?
Point in time?
Ongoing collection?
Data center or Internet?
Browser based?
Geographically distributed?
Connection speed?
How wide and how deep?
17. Point In Time Tools
Point in Time Tools
User Feedback
Yslow
Google Page Speed
Firebug
HTTP Analyzer
HTTP Watch
KITE
Good for rules based/best practice analysis and point in time
data collection
Free or almost free!
18. What a Difference a Couple
Thousand Data Points Make
Amazon Home Page
HTTP Analyzer Trace
81 requests/responses
19. What a Difference a Couple
Thousand Data Points Make
Amazon – Profile
15 slowest requests (Average and variability)
2,000 data points in sample
http://w w w .amazon.com/
http://w w w .amazon.com/gp/advertising/iframeproxy?dclick=amzn.us.gw .atf;sz%3D300x250;bn%3D
507846;
http://z-ecx.images-amazon.com/images/G/01/w ma/clog/core2._V241266071_.js
http://g-ecx.images-
amazon.com/images/G/01/gno/images/orangeBlue/navPackedSprites_v8._V245110247_.png
http://d3dtik4dz1nejo.cloudfront.net/70.html
http://z-ecx.images-amazon.com/images/G/01/nav2/gamma/amazonJQ/amazonJQ-combined-core-
20620._V223529337_.
http://g-ecx.images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V42752373_.gif
http://g-ecx.images-
amazon.com/images/G/01/marketing/visa/321/CS2274_Amazon_Card_Images_79x80_Blue_r01._V
http://m1.2mdn.net/view ad/1511700/new _dslr_300_022709.jpg
http://g-ecx.images-amazon.com/images/G/01/gourmet/110/CC50_B0002R38XC._V235261631_.jpg
http://g-ecx.images-amazon.com/images/G/01/ui/loadIndicators/loadIndicator-large._V248199609_.gif
http://z-ecx.images-amazon.com/images/G/01/nav2/gamma/amazonJQ/amazonJQ-combined-
coreCSS-59291._V24547874
http://g-ecx.images-amazon.com/images/G/01/gift-cards/topnav/giftcard-envelope-
gno._V250128993_.gif
http://z-ecx.images-amazon.com/images/G/01/nav2/gamma/amazonShoveler/amazonShoveler-
amazonShovelerCss-128
http://g-ecx.images-
amazon.com/images/G/01/img09/sports/50/summer_toppicks_50._V225610403_.gif
0 200 400 600 800 1000 1200 1400 1600 1800 2000
MS
Average 85th 95th
20. Ongoing Measurement Approaches
Passive technology “watches” network traffic
Benefits:
Can “see” all users (huge sample, actual visitors)
Allows for “measurements” of pages that are difficult to measure in any
other way (like a purchase confirmation)
Challenges:
Security issues
Hybrid hosted sites and third party content (can’t see what is
happening with browser and external sources)
Not good for availability (a key PM activity)
Highly variable sample
21. Ongoing Measurement Approaches
Tagging technology uses JS to instrument areas on
the page with timers
Benefits:
Real user data.
Large sample
End user perspective (can include client time)
Challenges:
Requires code changes (on each page)
Lacking in granularity
Management ongoing can be cumbersome and difficult
22. Ongoing Measurement Approaches
Active technology uses synthetic transactions to
“simulate” users on the site
Benefits:
Controlled and consistent environment (only variables originate from
the site)
Repeatable
Large sample
Challenges:
Not every path can be scripted
Not every user configuration can be modeled
Choosing the “right” path can be difficult
23. Inside or Outside?
Where does the online application live?
No longer completely in the data center in most cases
Hybrid hosting, CDN, web services, third party content, third
party tags etc.
Very incomplete view of performance/quality
Where does the user live?
No users access the site from the data center
Performance management cannot be done effectively within a
LAN environment
Impact of external latency cannot be calculated
24. Multiple Locations or Not?
25.0
20.0
15.0
Seconds
10.0
5.0
0.0
QuickTax
Home Page Online Get Started Validation
Online Edition
Vancouver Telus 2.00 1.16 1.23 7.31 0.56
Calgary Telus 2.41 1.50 1.42 8.80 0.56
Toronto Bell 3.48 2.84 2.38 18.98 1.09
Montreal Verizon 3.99 3.08 2.57 20.91 1.15
Vancouver Telus Calgary Telus Toronto Bell Montreal Verizon
25. Download Time
0
1
2
3
4
5
6
7
8
9
UPS
Live
Travelocity
Wikipedia
Sprint
HotJobs
Career Builder
Disney
Fidelity
Yellow Pages
Google
AT&T
Orbitz
Merrill Lynch
MSN
eBay
Ask
CNN
Expedia
Time On Netw ork
AOL
Bank Of America
Symantic
Facebook
Ticketmaster
NY Times
Apple
Hewlett-Packard
Client Side Processing
Amazon
CBS Sportsline
Verizon
Yahoo
USA Today
Browser or Not?
Dell
Walmart
Priceline.com
MSNBC
Weather.com
Charles Schwab
FedEx
Monster
26. Browser or Not?
4.0
3.5
3.0
2.5
Seconds
2.0
1.5
1.0
0.5
0.0
Photo -
Dealer
Home Page TL Home Video Features Specs
Results
Gallery
Time In Brow ser 1.36 1.54 1.70 1.56 1.89 1.89
Dow nload Time 1.41 0.99 0.46 0.83 0.37 1.67
Dow nload Time Time In Brow ser
28. Browser Or Not?
The browser is “the” application engine
JS execution
Client side processing
Dynamic content
It is almost impossible to emulate complexity of
browser
Threading model
Blocking/Asynchronous characteristics
Dynamic JS and CSS engine
Flash/Silverlight/Flex load/dynamic paths and execution
Render related issues
30. How Wide and How Deep?
On any site there are an extremely large number of
pages that can be measured
Can’t measure everything
How do we choose?
User centric/business centric model
What are the most common and most critical paths that the
user takes throughout the site?
What pages share similar architecture/design/dependencies?
What pages/functions will wake up the CEO if they fail?
Even very large and complex sites can be measured in
two to five key business paths typically
32. Context Is Everything
Imagine if we all made up our own “goals” for
cholesterol
I consistently find performance people (CIO’s
performance analysts) who just make up what they think
are appropriate goals/targets for key metrics
99.999%?
97%?
A key component of any successful PM program is
context, using appropriate goals/targets
Competitive data sets are a great way to get that context
Great point of connection with business owners/objectives
36. Variability Is Very Important
Render Time Statistical Summary
12.0
10.0
8.0
Seconds
6.0
4.0
2.0
0.0
Interval Login Click Exchange Search - Orlando Submit Search - Cancun
International|Resort,
Arithmetic Mean Geometric Mean Median 85th Percentile 95th Percentile
37. Client Side Processing
Client side processing is virtually unexamined in most
performance management programs
Not tracked by most tools
Only beginning to be discussed as part of performance
management
Yet for many sites this is the key contributor to poor
performance
38. Core PM Metrics
To impact and improve user centric performance,
focus on 9 core metrics:
Availability
Outages
Average Download Time - Geo Mean
Time in Client Versus Time In Generation/Backend
Variability - 85th and 95th percentiles
Geographic Variability
Hourly Variability (Load Handling)
Third Party Quality
Size/Element Count/Domains
39. Core PM Metrics
Availability – 99.5% for multi-step transaction
Outages – 1 hour per month
Average Download Time - 1.5 -2.5s (broadband)
Time in Client Versus Time In Generation/Backend – Less
than 30% of page load
Variability - 85th and 95th percentiles – No more than 1.5X the
median
Geographic Variability – No more than 2X (fastest versus
slowest)
Hourly Variability (Load Handling) – Less than 20% peak
versus off peak
Third Party Quality – Tags under 50MS each (limited
variability, good availability)
Size/Element Count/Domains – Depends!
46. Diagnostic Process
Being with standards and good “change based” alerting
Are the metrics out of threshold (based on context)?
Or have they changed from where they have been?
48. Diagnostic Process
Do you see consistent performance problems over
time?
If so, the page needs to be profiled to determine
Content (CDN or web server quality)
Application
Front-end design (e.g. Third party calls)
If so, has something changed?
New content? New requests?
Is there a time of day/hour/location pattern?
Capacity
Edge cache
ISP issue
50. Diagnostic Process
Errors
Categorize by type
Network
Server
Application
Tool should have actual (not simulated) screen capture
Tool should use a browser
Many errors (most) are custom application or malformed pages
Browser is much better at catching errors that “HTTP
Request/Response Tool” because it is more sensitive to dynamic ,real
world issues
52. Overuse of Modular JS/CSS
Silo versus user “flow” based approach
JS and CSS have no strategy for minimizing separate and
isolated files
Need to take into account “flow” of user throughout site
Combination of JS and CSS is key
Reduces roundtrips
Lessen impact of single threading on JS
Combination (or packing) of files more critical than minification
Key: Combine JS/CSS. Think Paths not Pages.
53. JS Placement
Javascript files load one file at
a time
None of these images were downloaded to the
browser until 2.4 seconds into a 2.8 second page
load
Key: Combine, Move Down External JS
54. Roundtrips
Myth in front end design that page size/asset site is still
significant
Reducing cookie “overhead”
GZip
Minification
Image optimization
Etc
These are best practices but they cannot compare to
the criticality of round trips
Network speed much more critical than bandwidth (above
3.0Mbps) Key: Reduce roundtrips. CSS Sprite for static
content
63. Implementing Total Site Quality
Framework
Begin with the user centric approach
Apply competitive context and business goals to create
appropriate targets
Collect 9 core PM metrics
Use an ongoing, external, geographically distributed, browser
based solution to collect data
Path based, key pages/function approach
Apply collected data against targets
Flag change/target exceeded
Perform diagnostic process